Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-2119] fix issues and tests when reusing storage #272

Merged
merged 137 commits into from
Jun 11, 2024

Conversation

reneradoi
Copy link
Contributor

@reneradoi reneradoi commented Apr 30, 2024

Issue

When attaching an existing storage to a new unit, 2 issues happen:

  • Snap install failed because of permissions / ownership of directories
  • snap_common gets completely deleted

Solution

  • bump snap version, use the fixed one (the fixed revision is 47, this is already outdated as a newer version of the snap is already available and merged to main prior to this PR)
  • enhance test coverage for integration tests

Integration Testing

Tests for attaching existing storage can be found in integration/ha/test_storage.py. There are now three test cases:

  1. test_storage_reuse_after_scale_down: remove one unit from the deployment, afterwards add a new one re-using the storage from the removed unit. check if the continuous writes are ok and a testfile that was created intially is still there.
  2. test_storage_reuse_after_scale_to_zero: remove both units from the deployment, keep the application, add two new units using the storage again. check the continuous writes.
  3. test_storage_reuse_in_new_cluster_after_app_removal: from a cluster of three units, remove all of them and remove the application. deploy a new application (with one unit) to the same model, attach the storage, then add two more units with the other storage volumes. check the continuous writes.

Other Issues

As part of this PR, another issue is addressed: #306. It is resolved with this commit: 19f843c

@reneradoi reneradoi marked this pull request as ready for review April 30, 2024 15:11
@reneradoi
Copy link
Contributor Author

reneradoi commented Apr 30, 2024

This PR updates the installed snap to the fixed version which can install on existing storage and does not overwrite the data in snap_common.

Other content of this PR:

The integration test test_storage_reuse_after_scale_down is now working. I've:

  • created a storage pool on lxd
  • deployed the application with storage from that pool
  • adjusted the test workflow a little to make it more robust
  • re-attaching storage to a new unit tests correct now

The test case test_storage_reuse_in_new_cluster_after_app_removal is being skipped. Currently this test fails all the time, I first need to understand how this is supposed to work before fixing this test.

Copy link
Contributor

@Mehdi-Bendriss Mehdi-Bendriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks René! Could we add a test which does:

  • stopping continuous writes
  • scale-down to 0 units (while keeping storage around)
  • scale-up to N units with storage attach
  • check if the continuous writes data has been indeed been kept and is usable

tests/integration/ha/test_storage.py Show resolved Hide resolved
tests/integration/ha/test_storage.py Show resolved Hide resolved
tests/integration/ha/test_storage.py Show resolved Hide resolved
tests/integration/ha/test_storage.py Outdated Show resolved Hide resolved
Copy link
Contributor

@phvalguima phvalguima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Let's give this logic a try with the integration tests and see how it goes.

tests/integration/ha/test_storage.py Show resolved Hide resolved
tests/integration/ha/test_storage.py Outdated Show resolved Hide resolved
tests/integration/ha/test_storage.py Show resolved Hide resolved
tests/integration/ha/test_storage.py Outdated Show resolved Hide resolved
tests/integration/ha/test_storage.py Outdated Show resolved Hide resolved
tests/integration/ha/test_storage.py Show resolved Hide resolved
@reneradoi reneradoi requested a review from phvalguima June 3, 2024 12:51
Copy link
Contributor

@phvalguima phvalguima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @reneradoi I had a similar issue as yours, where there were units initially available for shards but they were still hanging as "UNASSIGNED". Although that was not about the lock itself, I think we can troubleshoot your problem here the same way.

I've added this logic here to give me more logs. You have to tweak the "explain" API to target the charm lock itself.

lib/charms/opensearch/v0/constants_charm.py Outdated Show resolved Hide resolved
lib/charms/opensearch/v0/opensearch_locking.py Outdated Show resolved Hide resolved
@reneradoi reneradoi requested a review from phvalguima June 10, 2024 14:35
phvalguima
phvalguima previously approved these changes Jun 10, 2024
Copy link
Contributor

@phvalguima phvalguima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. My only remark is that we need to dig more into what is causing the 503.

On a separate PR, we should log 503 errors and its explanation, as follows:

GET _cluster/allocation/explain
{
  "index": ".charm-node-lock",
}

@reneradoi
Copy link
Contributor Author

LGTM. My only remark is that we need to dig more into what is causing the 503.

On a separate PR, we should log 503 errors and its explanation, as follows:

GET _cluster/allocation/explain
{
  "index": ".charm-node-lock",
}

Yes that's true, I'll add another issue or draft PR, will think about it.

@reneradoi reneradoi merged commit 070182a into main Jun 11, 2024
21 of 28 checks passed
@reneradoi reneradoi deleted the DPE-2119-attach-existing-storage branch June 11, 2024 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants